This lab provides a basic introduction to high-level synthesis using the Vitis HLS tool flow. You will use Vitis HLS in GUI mode to create a project. You will simulate, synthesize, and implement the provided design.
After completing this lab, you will be able to:
Launch Vitis HLS: Select Start > Xilinx Design Tools > Vivado 2021.2 > Vitis HLS 2021.2
You can also invoke Vitis HLS from Vitis HLS Command prompt by selecting Start > Xilinx Design Tools > Vitis HLS 2021.2 Command Prompt and then typing vitis_hls in the terminal.
Getting Started view of Vitis HLS
Click the Browse… button of the Location field and browse to {labs}\lab1 on a Windows machine or {labs}/lab1 on a Linux machine creating sub-folders as necessary, and then click OK.
Note: From this point onward reference will be made to Linux name.
New Vitis HLS Project wizard
Using Parts Specify option in Part Selection Dialog
Explorer Window
The Design under consideration
It can be seen that the design is a matrix multiplication implementation, consisting of three nested loops. The Product loop is the inner most loop performing the actual Matrix elements product and sum. The Col loop is the outer-loop which feeds the next column element data with the passed row element data to the Product loop. Finally, Row is the outer-most loop. The res[i][j]=0 (line 41) resets the result every time a new row element is passed and new column element is used.
Program output
Double-click on matrixmul_test.cpp under testbench folder in the Explorer to see the content.
You should see two input matrices initialized with some values and then the code that executes the algorithm. If HW_COSIM is defined (as was done during the project set-up) then the matrixmul function is called and compares the output of the computed result with the one returned from the called function, and prints Test passed if the results match. If HW_COSIM had not been defined then it will simply output the computed result and not call the matrixmul function.
Select Project > Run C Simulation. Select the Launch Debugger option and click OK.
The application will be compiled with –g option to include the debugging information, the compiled application will be invoked, and the debug perspective will be opened automatically.
A Debug perspective
Scroll-down in the source view, and double-click in the blue margin at line 67 where it is about to output “{“ in the output console window. This will set a break-point at line 67.
The breakpoint is marked with a blue circle, and a tick.
Debugger’s intermediate output view
Software computed result
Computed results
Report view after synthesis is completed
Explorer view after the synthesis process
Note that when the syn folder under the Solution1 folder is expanded in the Explorer view, it will show report, verilog, and vhdl sub-folders under which report files, and generated source (vhdl, verilog, header, and cpp) files. By double-clicking any of these entries one can open the corresponding file in the information pane.
Also note that if the target design has hierarchical functions, reports corresponding to lower-level functions are also created.
Using scroll bar on the right, scroll down into the report and answer the following question.
Question 1
Answer the following question:
Estimated clock period:
Worst case latency:
Number of DSP48E used:
Number of FFs used:
Number of LUTs used:
Generated interface signals
You can see ap_clk, ap_rst, ap_ idle and ap_ready control signals are automatically added to the design by default. These signals are used as handshaking signals to indicate when the design is ready to take the next computation command (ap_ready), when the next computation is started (ap_start), and when the computation is completed (ap_done). Other signals are generated based on the input and output signals in the design and their default or specified interfaces.
Select Solution > Open Schedule Viewer or click on Analysis button on tools bar to open the analysis viewer.
The Analysis perspective consists of 4 panes as shown below. Note that the module and loops hierarchies are displayed unexpanded by default. The Module Hierarchy pane shows both the performance and area information for the entire design and can be used to navigate through the hierarchy. The Performance Profile pane is visible and shows the performance details for this level of hierarchy. The information in these two panes is similar to the information reviewed earlier in the synthesis report. The Schedule Viewer is also shown in the right-hand side pane. This view shows how the operations in this particular block are scheduled into clock cycles.
Analysis perspective
Performance matrix showing top-level Row operation
From this we can see that there is an add operation performed. This addition is likely the counter to count the loop iterations, and we can confirm this.
Cross probing into the source file
A C/RTL Co-simulation Dialog
Click OK to run the VHDL simulation.
The C/RTL Co-simulation will run, generating and compiling several files, and then simulating the design. It goes through three stages.
First, the VHDL test bench is executed to generate input stimuli for the RTL design.
Second, an RTL test bench with newly generated input stimuli is created and the RTL simulation is then performed.
Finally, the output from the RTL is re-applied to the VHDL test bench to check the results.
In the console window you can see the progress and also a message that the test is passed.
This eliminates writing a separate testbench for the synthesized design.
Console view showing simulation progress
Once the simulation verification is completed, the simulation report tab will open showing the results. The report indicates if the simulation passed or failed. In addition, the report indicates the measured latency and interval. Since we have selected only VHDL, the result shows the latencies and interval (initiation) which indicates after how many clock cycles later the next input can be provided.
Co-simulation results
Click on the Verilog selection option.
Optionally, you can click on the drop-down button and select the desired simulator from the available list of Vivado XSim, ModelSim, Xcelium, VCS, and Riviera.
Setting up for Verilog simulation and dump trace
When RTL verification completes, the co-simulation report automatically opens showing the Verilog simulation has passed (and the measured latency and interval). In addition, because the Dump Trace option was used and Verilog was selected, two trace files entries can be seen in the Verilog simulation directory.
Explorer view after the Verilog RTL co-simulation run
The Co-simulation report shows the test was passed for Verilog along with latency and Interval results.
Cosimulation report
Full waveform showing iteration worth simulation
Note that as soon as ap_start is asserted, ap_idle has been de-asserted indicating that the design is in computation mode. The ap_idle signal remains de-asserted until ap_done is asserted, indicating completion of the process. This indicates 24 clock cycles latency.
A Export RTL Dialog box
With default settings (shown above), the IP packaging process will run and create a package for the Vivado IP Catalog. Another option available from the Export Format drop-down menu, is to create a Vivado IP for System Generator.
Selecting Evaluate options
Click OK and the implementation run will begin.
You can observe the progress in the Vitis HLS Console window. It goes through several phases:
Console view
When the run is completed the implementation report will be displayed in the information pane.
Implementation results in Vitis HLS
Observe that the timing constraint was met, the achieved period, and the type and amount of resources used.
Explorer view after the RTL Export run
Expand the Verilog and vhdl sub-folders and observe that the Verilog sub-folder only has the rtl file whereas the vhdl sub-folder has several files and sub-folders as the synthesis and implementation runs were made for it.
It includes project.xpr file (the Vivado project file), matrixmul.xdc file (timing constraint file), project.runs folder among others.
The implementation directory
The ip folder content
In this lab, you completed the major steps of the high-level synthesis design flow using Vitis HLS. You created a project, adding source files, synthesized the design, simulated the design, and implemented the design. You also learned how to use the Analysis capability to understand the scheduling and binding.
Answers for question 1:
Estimated clock period: 6.816 ns
Worst case latency: 24 clock cycles
Number of DSP48E used: 2
Number of FFs used: 66
Number of LUTs used: 365
——————————————————
Copyright© 2022, Advanced Micro Devices, Inc.